PROV-man: A PROV-compliant toolkit for provenance management
نویسندگان
چکیده
6 Discoveries in modern science can take years and involve the contribution of large amounts of data, many 7 people and various tools. Although good scientific practice dictates that findings should be reproducible, in 8 practice there are very few automated tools that actually support traceability of the scientific method employed, 9 in particular when various experimental environments are involved at different research phases. Data 10 provenance tracking approaches can play a major role in addressing many of these challenges. These 11 approaches propose ways to capture, manage, and use of provenance information to support the traceability of 12 the scientific methods in heterogeneous environments. PROV is a W3C standard that provides a comprensive 13 model for data and semantics representation with common vocabularies and rich concepts to describe 14 provenance. Nevertheless, it is difficult for domain scientists to easily understand and adopt all the richeness 15 provided by PROV. In this paper we describe the design and implementation of the provenance manager 16 PROV-man, a PROV-compliant framework that facilitates the tasks of scientists in integrating provenance 17 capabilities into their data analysis tools. PROV-man provides functionalities to create and manipulate 18 provenance data in a consistent manner and ensures its permanent storage. It also provides a set of interfaces to 19 serialize and export provenance data into various data formats, serving interoperability. The open architecture 20 of PROV-man, consisting of an API and a configurable database, allows for its easy deployment within 21 existing and newly developed software tools. The paper presents examples illustrating the usage of PROV22 man. The first example illustrates how to create and manipulate provenance data of an online newspaper 23 article using PROV-man. The second example demonstrates and evaluates the PROV-man implementation in a 24 more complex case for collection of provenance data about biomedical data analysis activities that are carried 25 out using a distributed computing infrastructure. 26
منابع مشابه
Interoperability for Provenance-aware Databases using PROV and JSON
Since its inception, the PROV standard has been widely adopted as a standardized exchange format for provenance information. Surprisingly, this standard is currently not supported by provenanceaware database systems limiting their interoperability with other provenance-aware systems. In this work we introduce techniques for exporting database provenance as PROV documents, importing PROV graphs ...
متن کاملJSON and its use in Semantic Web
The semantic web has evolved over the current web and aims to provide a web that allows for easy retrieval and accessing of information by both man and machine. It provides for a wide variety of technology stacks , language standards and software components which help both man and machine to access data easily. Intelligent information retrieval and the credibility of data is managed in semantic...
متن کاملA Software Framework for Data Provenance
Data provenance refers to the historical record of the derivation of the data, allowing the reproduction of experiments, interpretation of results and identification of problems through the analysis of the processes that originated the data. Data provenance contributes to the evaluation of experiments. This paper presents a framework for data provenance using the W3C provenance data model, call...
متن کاملPROV-O-Viz - Understanding the Role of Activities in Provenance
This paper presents PROV-O-Viz, a Web-based visualization tool for PROV-based provenance traces coming from various sources, that leverages Sankey Diagrams to reflect the flow of information through activities. We briefly discuss the advantages of this approach compared to other provenance visualization tools. PROV-O-Viz has already been used to visualize provenance traces generated by very dif...
متن کاملD-PROV: Extending the PROV Provenance Model with Workflow Structure
This paper presents an extension to the W3C PROV provenance model, aimed at representing process structure. Although the modelling of process structure is out of the scope of the PROV specification, it is beneficial when capturing and analysing the provenance of data that is produced by programs or other formally encoded processes. In the paper, we motivate the need for such extended model in t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PeerJ PrePrints
دوره 3 شماره
صفحات -
تاریخ انتشار 2015